課程大綱

課程資訊

課程名稱	數據分析學與模型專題 Special Topics in Data Analytics and Modeling
開課學期	105-1
授課對象	電機資訊學院資訊工程學研究所
授課教師	莊炳湟
課號	CSIE5610
課程識別碼	922EU4350
班次
學分	3
全/半年	半年
必/選修	選修
上課時間	星期四2,3,4(9:10~12:10)
上課地點	資101
備註	本課程以英語授課。與李琳山合開總人數上限：45人
Ceiba 課程網頁	http://ceiba.ntu.edu.tw/1051CSIE5610_
課程簡介影片
核心能力關聯	本課程尚未建立核心能力關連
課程大綱
為確保您我的權利,請尊重智慧財產權及不得非法影印
課程概述	Data is at the center of the so-called “fourth paradigm of scientific research” that will spawn new sciences useful to the society. Data is also the new and extremely strong driving force behind many present-day applications, such as smart city, manufacturing informatics, and societal security, to name a few. It is thus imperative that our students know how to handle data, analyze data, use data and draw insights from data. This course aims at acquainting the students with the analytical foundation of data handling techniques. The course consists of a series of lectures and seminar talks with substantial student participation, in the form of research and presentation in response to posted questions about main topics in data analytics and modeling. The course will consist of a balanced exposition of what, how and why as related to subjects at the core of data analytics and modeling. 1. Scope Broad topics covered in the course include: • Probability distribution & parameter estimation • Regression & curve fitting • Data modeling: generative models, discriminative models, mixture models, latent variable models & hybrid distributions • Data modeling for large observations: modeling of sequential data or observations of large dimensionality; hidden Markov models, Markov random fields, & graphical models • Pattern recognition & decision theory • Machine learning, neural networks and deep learning We’ll spend on average about 2 weeks on each topic. The course puts breadth first and depth second as it is offered to students of varying technical backgrounds. 2. Format For each topic, after an introductory lecture by the lecturer, individual students will be assigned to conduct research, answer generic as well as specific questions and return with presentations to the class. Each student presentation is of duration 20-30 min., followed by ~10 min. questions and discussion. Students who are assigned to address specific topics and questions have one week time to prepare for the presentation. Presented material must first make sense, logical as well as technical, to the presenting student in order to be able to “teach” and “convince” the class how best to learn and understand the subject matter. Generic questions applicable to any engineering topic are: ‒ What are the problems (and/or observations) that gave rise to the particular topic & concept? (The original motivation); ‒ What are the problem formulations with relevant assumptions that have been proposed? (The methodology and formulation); ‒ What are the ensemble of techniques that were developed to solve the problem? (The tools and capabilities); ‒ How do these techniques solve the problem or contribute to the solutions? (The solution mechanism); ‒ What problems beyond the original motivation will the topic and the related techniques be able to solve? (New and novel applications); ‒ What are the limitations of the solutions proposed so far? Any remaining open problems in the topic? (Research opportunities). Each presentation should attempt to answer these questions to the best of the presenter’s ability and the availability of the relevant material. Other topic-specific questions may also be posted and addressed in student presentations. After a topic is addressed by student presentations, one or two commentary sessions by the lecturer on the subject will follow so as to complete the systematic development of understanding of the subject/topic. The course will be primarily conducted in English. To reflect the applicability of the subject matter to local problems, local languages may also be used as the circumstance calls for it.
課程目標	Overall, students will be exposed to data analytic topics and their historical perspectives, learn to ask and analyze related problems, understand the modeling techniques and their origins, and conceive of new applications and research opportunities.
課程要求	No written test will be given in the special course. Student presentations are evaluated by the class and moderated by the lecturer.
預期每週課後學習時數
Office Hours
指定閱讀	Open literature and online information related to the assigned topics.
參考書目	No official textbook is assigned in this course. Students are expected to conduct research with all university provided resources (e.g., books in the library) and information available on the web. Class notes by the lecturer will be distributed in due course.
評量方式 (僅供參考)

課程進度

週次	日期	單元主題
第1週	9/15	Holiday
第2週	9/22	‒ Introduction; course format & logistics; learning how to learn through the example of “spectrum” <br>
第3週	9/29	‒ Review of probability theory<br> ‒ Introduction to distribution estimation
第4週	10/06	‒ Student presentations on distribution estimation:<br> 　• Estimation of parameters of Uniform distribution and Laplacian distribution, criteria, quality of estimator (A.1)<br> 　• Estimation of parameters of log-normal distribution and Rayleigh distribution, criteria, quality of estimator (A.2)<br> 　• Estimation of parameters of multivariate normal and multivariate exponential distributions (A.3)<br> 　• How the concepts of confidence interval and margin of error are applied in real world applications – interesting examples (A.4)<br> ‒ Comments and Summary Lecture<br> ‒ Introduction to regression
第5週	10/13	‒ Student presentations on regression: <br> 　• Regression and methods of least squares: history, motivation, formulation and real world examples (B.1)<br> 　• Simple regression I: linear regression, polynomial regression, generalized linear regression (B.2)<br> 　• Simple regression II: nonlinear regression, logistic regression, non-parametric regression (B.3)<br> 　• Multiple regression (B.4)<br> ‒ Comments on error models and optimization criterion; Summary
第6週	10/20 (rescheduled to 10/14)	‒ Introduction of models and modeling<br> ‒ Student presentations on modeling:<br> 　• Clustering: formulation of problem, algorithms, examples (C.1)<br> 　• Vector quantization, Lloyd algorithm, and applications (C.2)
第7週	10/27 (rescheduled to 11/3)	‒ Student presentations on modeling: <br> 　• Mixture model or distribution: definition, motivation, and examples (C.3)<br> 　• Parameter estimation of mixture models (C.4)<br> 　• Principal Component Analysis: what, why and how (the tool of eigen-analysis) with examples (C.5) <br> 　• Dimensionality Reduction and PCA: subspace, signal-to- noise ratio, residual error, entropy and retention of information (C.6) <br> 　• Factor Analysis: motivation, real world problem examples, the tool of PCA, idea of latent variables and latent variable model (C.7)<br> ‒ Comments and Summary Lecture
第8週	11/03 (rescheduled to 11/4)	‒ Lecture and discussion on models and modeling
第9週	11/10	‒ Student presentations on modeling: <br> 　• Autoregressive modeling and linear prediction: motivation, rationale, model form, key results and examples (C.8)<br> 　• Linear prediction, inverse filtering and spectral estimation – a linear system perspective (C.9)<br> 　• Markov Chain and Processes (C.10)<br> 　• Markov Random Fields: Motivation, definition, key results, applications, open problems (C.11)<br> ‒ Comments
第10週	11/17	‒ Student presentations on modeling: <br> 　• Graphical Models: types, relationship with other models, estimation methods, application examples, open problems (C.12)<br> 　• Hidden Markov Models: motivation, intuition, pedagogical problems (C.13)<br> 　• Applications of hidden Markov models: speech recognition, language modeling, and more (C.14)<br> ‒ Introduction to pattern recognition and optimal decision theory
第11週	11/24	‒ Student presentations on pattern recognition:<br> 　• Pattern Recognition Approach I: statistical methods, formulations (parametric & non-parametric), Generative Models and Discriminative Models, examples (D.1)<br> 　• Pattern Recognition Approach II: syntactic methods, formulation and examples (D.2)<br> 　• Basic Support Vector Machine: motivation, justification, formulation, examples and deficiencies (D.3)<br>
第12週	12/01	‒ Student presentations on decision theory: <br> 　• Extensions of Support Vector Machine (to overcome original deficiencies), kernel methods and other enhancements (D.4)<br> 　• Pattern recognition applications: early examples in handwritten digit recognition (D.5) <br> ‒ Introduction to Machine Learning <br> ‒ Student presentations on Machine Learning:<br> 　• Types of machine learning tasks by use of data experience, examples (E.1)<br>
第13週	12/08	‒ Student presentations on Machine Learning: <br> 　• Types of machine learning tasks by output, examples (E.2) 　• Overview of notable machine learning algorithms (E.3) <br> 　• Decision tree learning and Association rule learning (E.4) <br> 　• Perceptron and feedforward networks, training, error backpropagation, and applications (E.6) <br> ‒ Comments and summary lecture
第14週	12/15	‒ Student presentations on Machine Learning: <br> 　• Recurrent networks as associative memory: Hopfield nets, self-organizing maps, their training rules and examples (E.5)<br> 　• Boltzmann machine (BM) and restricted Boltzmann machine (RBM), training, uses; relationship among BMs, RBMs, generative (statistical) models (e.g. mixture models, latent variable models and Markov random field) (E.7) 　• RBM for data and dimensionality reduction (E.8) <br> 　• Extension of RBM: Deep Belief Networks and auto-encoders(E.10) ‒ Comments and summary lecture
第15週	12/22	‒ Student presentations on Machine Learning: <br> 　• From RBM to Deep Neural Networks (DNN), insights, implications and open issues (E.11)<br> 　• Recurrent Networks for Sequential Input with Memory: Long Short-term Memory (LSTM) Neural Networks (E.12) <br> 　• Application examples of DNN (E.13)<br> 　• Manifold learning (E.14)<br> ‒ Comments and summary lecture
第16週	12/29	‒ Comments, Summary and Closing Lecture